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Abstract — Although there is an increasing number of online tests in the world, little research is currently 
known in Spain today. Assessment has become an integral part of education and the implications of the 
various uses of language testing go beyond the educational settings (Douglas, 2010; Fulcher, 2010). This study 
describes the PAULEX project. This project deals with the design and validation of a computer-based 
University Entrance examination for Spanish universities. The test measures the students’ oral and written 
competence including tasks on four different skills: reading, writing, listening and speaking. This paper will 
address some specific features and reaction studies that are already in process. It is by no means this paper’s 
object to give detailed information of each individual aspect but instead to give an overview of what has been 
done so far and provide concepts of the current research. In doing so, the paper is divided into three main 
sections: technological advances, attitudinal observations and results. 

Index Terms — testing, English as foreign language, computers, attitudes 



I. Introduction 

Although there is an increasing number of online tests in the world, little research is currently known in Spain today. 
Assessment has become an integral part of education and the implications of the various uses of language testing go 
beyond the educational settings (Douglas, 2010; Fulcher, 2010). The reasons could be that computers require economic 
efforts from the researching institution at the beginning and that researching on new field requires specialists of more 
areas than those demanded for traditional testing (Bachman, 2000; Hambleton & Slater, 1997). Probably, these are also 
the reasons why the Spanish Ministry of Education may provide funding for research projects in the field, but the 
implementation of those projects may, in reality, be far for being put into practice. The PAULEX project, whose 
findings are hereby succinctly presented, may have contributed to the general knowledge in the area but most likely will 
take a long time (if ever) to be considered worth putting into practice. This article briefly reviews the findings of the 
PAULEX project and suggests reasons to make the web-based University Entrance Examination (IB PAU) real. 

11. Computer-based Language Testing 

Computer-based language testing has been with us since the mid-eighties, as a way to deliver language tests via the 
computer. According to their construct, they can be closer to traditional or adaptive tests (Economides & Roupas, 2008; 
Ockey, 2009; Yen et al, 2010) based on item-response theory (Colpaert, 2004) in which providing the correct response 
to an item leads to a more difficult one and vice-versa. When compared to traditional tests. Computer Assisted 
Language Testing (henceforth CALT) offers a large number of advantages, such as the inclusion of visual media, the 
delivery of spoken items without the need of human resources (Ockey, 2009; Siozos et al., 2009; Tung & Deng, 2006), 
a larger type of integrated and analytical test items, or rapid feedback (immediate when dealing with objective items). 
Additionally, online delivery permits students to take the tests at lower costs from their schools. However, the most 
important effects are based on the validity of the test, especially because the test conditions (with very few exceptions)^ 
are the same for all the candidates (Pardo-Ballester, 2010). Until recently, Roever (2001) asserts that preparing those 
tests was more difficult than traditional ones and also more expensive to produce and maintain. However, recent 
advancements in web-based tests have facilitated the maintenance operations and reduced the number of hours, since 
the same problems may be solved for the whole system instead of checking each individual computer. Besides, each 
computer does not need to have the whole software installed, but may work with just a number of plug-ins. At present, 
most of the important computer-based tests use a combination of traditional and self-corrected items. For instance, 
TOEFL has an integrated skills system (Garcia Laborda, 2006; Meskill & Rangelova, 1995) that allows a chain of tasks 
corresponding to different skills such as speaking-listening-reading- writing (with all the combinative possibilities). 



^ In high stakes tests, these exceptions are usually well known, such as the security breakthrough in the Turkish OSYM in 2010 
(http://www.todayszaman.com/news-222239-many-security-gaps-found-in-osym-as-police-arrest-four.html) or the TOEEL Internet cut during a test 
in Shanghai, Beijing, Guangzhou and Hangzhou (http://www.guardian.co.uk/education/2006/oct/20/tefl2?INTCMP=SRCH). 
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Likewise, the PLEVALEX and PAUER platforms share many features with the TOEEL, although the Spanish platforms 
carry out skills integration in a less sophisticated and more linear fashion. 

III. How Was the Spanish PAULEX Designed? 

This project dates back to the original HIEO/HIELE project started in the Universitat Politecnica de Valencia (UPV), 
where a group of experts noticed the need to have a more efficient testing system to diagnose international students 
arriving at that university. The system consisted of five main modules, including grammar, reading, writing, listening 
and speaking (Garcia Laborda, 2006). The testing delivery system was designed by the experts at the UPV and 
implemented by a company in Madrid. The experimentation was rather limited, but 100 Trench or English students took 
a test through that system. The results of the positive experience were sent to the Generalitat Valenciana. A second 
version of the platform was implemented only for the writing module. The section, although a bit limited, was 
implemented and later incorporated to the PLEVALEX testing platform. PLEVALEX was the second and improved 
design of HIEO. After further experimentation, in 2007 the current PAULEX/PAUER platform was implemented. This 
platform began to be operational in 2009 and tests of English and Spanish were administered. In fact, by mid 2010, 
more than 300 students had taken an Internet-based version of the English or Trench PAU, both in the university 
campus (and net) system and outside it (Colegio El Pilar and Instituto Benllure of Valencia). The platform had also 
served to train a group of 30 teachers on the integration of computers in the PAU. By the end of 2010, it was clear that 
the IB PAU could be implemented and that students would not reject the testing delivery system. Eurthermore, studies 
were conducted to observe their reactions towards the implementation of the PAU through mobile phones and devices 
with promising results. 



IV. What Did We Study in These Experiences? 

As mentioned in the previous section, overall the system combined many research aspects. The research team was 
mainly divided into two sections: language testing and computer/software development. Both teams have spent the last 
years providing an immense quantity of information. 

A. Information on Language Testing 

This team varied in size and a number of experts took part in the different aspects of the design and implementation 
stages. This team worked alongside computer engineers who designed the interfaces. Within the three areas that were 
the main intended research subjects, we focused on technology development and its effects in humans. However, 
previous stages took us to other sub-studies. These studies were necessary to focus the research adequately: 

• Desire and need to change the University Entrance examination. 

• Current problems of high school students in their foreign language classes. 

• Advantages and disadvantages of high stakes testing. 

• Attitudes towards the new oral tasks. 

B. Information on Computers and Their Effect on the Stakeholders 

Obviously, the design of a computer-based test requires a deep study of the interaction between the candidate (human 
user) and the computer. Much of the interaction in the testing situation occurs at the testee interface level (Eulcher, 
2003). Thus, there are two main elements that participate in the final design of the delivery system, which are the 
hardware and the software. The PAULEX project technological development of interfaces was based on the results 
suggested by Garcia Laborda (2006), which lead to an interface design similar to TOEEL (Garcia Laborda, 2006) and 
quite different from other tests like lELTS or BULLATS (Garcia Laborda, 2007). On the other hand, the software 
principles and database were based on the principles set by Gimeno Sanz in 2002 and used for the Ingenio language 
teaching and learning platform. As for the PAULEX project, the following are the fundamental design principles that 
were addressed in different projects: 

• Previous interface design, especially Elucher’s principles (2003). 

• A comparison of different computer-based tests. 

• Effect of image and computer context in the candidates. 

V. Experimental Phase 



A. The Test and Its Implications 

The language testing experimentation began through collecting the teachers’ attitudes towards the still current PAU 
and a new one based on the introduction of speaking and listening tasks. Garcia Laborda & Eernandez Alvarez (2010) 
discovered that teachers were eager to implement a dramatic change, although they also feared the effects of such a 
change. 

We also found that many teachers were excited about the impact of technology, but many felt overwhelmed by the 
excess of responsibility of getting high grades in the PAU. All these fears and anxiety were, naturally, due to the 
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possibility to include the speaking tasks (listening did not have such impact on the teachers’ attitudes), which most 
teachers thought to be based on short conversations in pairs and short descriptive monologues. 

In relation to the construction of the test, we found evidence that supports that the change would have an impact 
almost uniquely on the speaking tasks. Their results supported the fact that, in the long run, due to the washback effect 
(Amengual Pizarro, 2009; Garcia Laborda, 2008; Green, 2007; Hausenberg, 2006; Luxia, 2007; Watanabe, 2004), 
students would improve their speaking ability. They also stated that converting current PAU tasks into computer tasks 
would have limited differences. The last part of the studies conducted in relation to language testing was done in teacher 
training. In order to foresee future effects of the new test, a group of 27 teachers was trained in Valencia in a teacher 
development course for continuing professional development (CPD). These in-service teachers took a 30-hour course 
that included both the development of their capacities as testers and training in the use of technology for education. 
Results indicated that when appropriate training is provided and teachers know when, why and how to test, the effect of 
technology diminishes (Garcia Laborda & Litzler, 2011). 

B. Looking at Computers 

The results were obtained through experimentation of software and, above all, interface information. In some papers, 
the research group found that students would be able to adapt to the new test delivery system (Figs. 1 and 2). 




Eigure 1. High school student takes the IB PAU (2008 version). 




Eigure 2. Testing conditions in the IB PAU (2008 version). 



As can be seen, the development of the testing interface (Fig. 3), the buttons and the navigation elements were 
reduced and also followed the design principles stated by Fulcher in 2003. This interface design has been informed in a 
number of papers (Garcia Laborda, 2009). 
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Eigure 3. PAULEX interface design. 



The most significant advances, however, were made in relation to the students’ adaptation to the computer-delivered 
language testing. A number of studies evinced that, although students may face the test for the first time, they can adapt 
easily to most tasks. Overall, the ones to which students adapt more easily are those that can be responded with a tick or 
response selection (such as multiple choice tasks). Field studies also found that, despite the teachers’ attitudes, most 
difficulties were associated to writing. Other aspects of research included the use of mobiles for testing through a testing 
emulator (Fig. 4). 




Eigure 4. Mobile testing through the use of emulators. 



VI. Discussion and Implications eor the Classroom 

The final goal of this project was to obtain evidence of the processes and contexts in which the onlining of the 
University Entrance Examination (PAU) could be feasible. This research was predominantly practical. Practicality, 
which was not considered initially, came across as a valuable asset because the crisis in the last two years also requires 
the limitation of test production and delivery costs. Additionally, practicality was considered because educational 
boards may also consider the costs of a face to face speaking test done either in the students’ high school or in the 
universities location. 

Testing implications 

Computer tests should reduce costs but also improve the construct validity in relation to formats and constructs that 
are used today. However, although the four different skills to be assessed would be interrelated, they also need to be 
analyzed through factor analysis and intersection correlation when the official trial tests begin (if ever). 
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VII. Conclusions 

Our project and its related studies showed that it is possible and reliable to implement an Internet-based PAU. 
However, this study can only be considered a first approach and the final validation may depend to a large extent on the 
regional educational testing boards’ interest in trialing, redesigning and implementing the testing platform and test 
construct in a near future. Thus, a genuine belief in the tremendous possibilities of this test delivery system is 
indispensable. That means that much work still needs to be done. The project, the most ambitious one in Spain to this 
date, requires both a technical and research continuation that in this period of national crisis seems too difficult to 
achieve. Testing advances require an expert definition of the construct with the integration of different items and skills. 
Since developing automatic scores in writing and, especially, speaking seems too distant for now, professional training 
is necessary (Amengual Pizarro, 2006). Finally, the limited number of projects of this type in Spain, however, may be a 
reason to continue research in this field. 
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